Streaming, Memory-Limited PCA
نویسندگان
چکیده
In this paper, we consider a streaming one-pass-over-the-data model for Principal Component Analysis (PCA). The input, in this case, is a stream of p-dimensional vectors, and the output is a collection of k, p-dimensional principal components that span the best approximating subspace. Consequently, the minimum memory requirement for such problems is O(kp). Yet the standard PCA algorithm requires us to form the empirical covariance matrix, typically a dense p × p matrix, hence requiring O(p) memory. Although there exist several incremental algorithms that require O(kp) memory, to the best of our understanding, these methods currently do not have known finite-sample performance bounds. That is, in the high-dimensional setting where the number of samples and dimensionality scale together, there is no known provably correct algorithm. This paper considers this simple but important problem. We give what is to the best of our knowledge, the first streaming algorithm requiring only O(kp) memory, that makes a single pass over the data, and whose performance matches the standard batch algorithm up to logarithmic factors.
منابع مشابه
Memory Limited, Streaming PCA
We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output it...
متن کاملStreaming PCA with Many Missing Entries
We consider the streaming memory-constrained principal component analysis (PCA) problem with missing entries, where the available storage is linear in the dimensionality of the problem, and each vector has so many missing entries that matrix completion is not possible. SVD-based methods cannot work because of the memory constraint, while imputation-based updates fail when faced with too many er...
متن کاملA scalable supervised algorithm for dimensionality reduction on streaming data
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification pro...
متن کاملStream-based Hebbian eigenfilter for real-time neuronal spike discrimination
BACKGROUND Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calcul...
متن کاملIncremental kernel PCA and the Nyström method
Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013